⚡️ Speed up function `process_cycle` by 9% #222

codeflash-ai · 2025-10-24T06:08:37Z

📄 9% (0.09x) speedup for `process_cycle` in `stanza/models/common/chuliu_edmonds.py`

⏱️ Runtime : 12.7 milliseconds → 11.6 milliseconds (best of 35 runs)

📝 Explanation and details

The optimized code achieves a 9% speedup through several key optimizations that reduce redundant computations and memory operations:

Key optimizations:

Reduced repeated array indexing: The original code repeatedly computed scores[cycle] and scores[noncycle] in multiple places. The optimized version stores these as scores_cycle and scores_noncycle variables, eliminating redundant indexing operations.
Replaced expensive np.pad with manual allocation: The most significant optimization replaces np.pad(subscores, ((0,1), (0,1)), 'constant') with manual allocation using np.zeros() and direct assignment. The np.pad function has overhead for handling general padding scenarios, while direct allocation and assignment is more efficient for this specific use case.
Precomputed array lengths and indices: Added len_cycle, len_noncycle, and idx variables to avoid repeated shape computations and np.arange() calls.
Split complex operations: The original computed metanode_head_scores in one complex line with multiple operations. The optimized version splits this into separate operations, allowing better memory management and potentially better compiler optimization.

Performance characteristics based on test results:

The optimizations are most effective for medium to large-scale graphs (like the 1000-node test cases) where the repeated indexing and padding operations become more expensive
For sparse cycles (few nodes in cycle), the reduced indexing overhead provides consistent benefits
The manual padding replacement is particularly beneficial when the contracted graph size is substantial, as seen in cases with many non-cycle nodes

The optimizations maintain identical functionality while reducing computational overhead through more efficient memory access patterns and elimination of redundant operations.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 24 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests and Runtime

import numpy as np
# imports
import pytest
from stanza.models.common.chuliu_edmonds import process_cycle

# unit tests

# -------------------- Basic Test Cases --------------------

def test_basic_two_node_cycle():
    # Smallest possible cycle: 2 nodes, both in cycle
    tree = np.array([1, 0])
    cycle = np.array([True, True])
    scores = np.array([[0, 2],
                       [3, 0]])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)
    # The contracted score should be 0, as there are no non-cycle nodes

def test_basic_three_node_cycle_and_one_noncycle():
    # 3 nodes in cycle, 1 node not in cycle
    tree = np.array([1, 2, 0, 0])
    cycle = np.array([True, True, True, False])
    scores = np.array([
        [0, 1, 2, 7],
        [3, 0, 4, 8],
        [5, 6, 0, 9],
        [10, 11, 12, 0]
    ])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)


def test_basic_one_node_cycle():
    # Single node cycle (self-loop)
    tree = np.array([0, 1, 2])
    cycle = np.array([True, False, False])
    scores = np.array([
        [0, 1, 2],
        [3, 0, 4],
        [5, 6, 0]
    ])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)
    # The contracted scores should match the original for noncycle-noncycle

# -------------------- Edge Test Cases --------------------

def test_edge_all_nodes_in_cycle():
    # All nodes are in the cycle
    tree = np.array([1, 2, 0])
    cycle = np.array([True, True, True])
    scores = np.array([
        [0, 1, 2],
        [3, 0, 4],
        [5, 6, 0]
    ])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)

def test_edge_cycle_at_end():
    # Cycle is at the end of the array
    tree = np.array([1, 2, 3, 4, 3, 5])
    cycle = np.array([False, False, False, True, True, False])
    scores = np.array([
        [0, 1, 2, 3, 4, 5],
        [6, 0, 7, 8, 9, 10],
        [11, 12, 0, 13, 14, 15],
        [16, 17, 18, 0, 19, 20],
        [21, 22, 23, 24, 0, 25],
        [26, 27, 28, 29, 30, 0]
    ])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)

def test_edge_cycle_at_start():
    # Cycle is at the start of the array
    tree = np.array([1, 0, 3, 2, 4])
    cycle = np.array([True, True, False, False, False])
    scores = np.array([
        [0, 1, 2, 3, 4],
        [5, 0, 6, 7, 8],
        [9, 10, 0, 11, 12],
        [13, 14, 15, 0, 16],
        [17, 18, 19, 20, 0]
    ])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)

def test_edge_cycle_with_negative_scores():
    # Cycle with negative edge scores
    tree = np.array([1, 2, 0, 3])
    cycle = np.array([True, True, True, False])
    scores = np.array([
        [0, -2, -3, 4],
        [-1, 0, -4, 5],
        [-5, -6, 0, 6],
        [7, 8, 9, 0]
    ])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)


def test_edge_cycle_with_single_noncycle():
    # 1 node in cycle, 1 node not in cycle
    tree = np.array([0, 1])
    cycle = np.array([True, False])
    scores = np.array([[0, 1], [2, 0]])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)

# -------------------- Large Scale Test Cases --------------------


def test_large_scale_half_cycle():
    # Large graph, half nodes in cycle, half not
    n = 1000
    tree = np.arange(n)
    cycle = np.zeros(n, dtype=bool)
    cycle[:n//2] = True
    scores = np.random.uniform(-100, 100, size=(n, n))
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)
    # After contraction: (n//2 noncycle + 1 metanode) x (n//2 noncycle + 1 metanode)
    expected_size = (n//2 + 1, n//2 + 1)

def test_large_scale_all_cycle():
    # Large graph, all nodes in cycle
    n = 1000
    tree = np.roll(np.arange(n), -1)
    cycle = np.ones(n, dtype=bool)
    scores = np.random.normal(0, 1, size=(n, n))
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)

def test_large_scale_sparse_cycle():
    # Large graph, sparse cycle (every 100th node is in the cycle)
    n = 1000
    tree = np.arange(n)
    cycle = np.zeros(n, dtype=bool)
    cycle[::100] = True
    scores = np.random.randint(-100, 100, size=(n, n))
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)
    expected_cycle_count = n // 100 + (1 if n % 100 != 0 else 0)
    expected_noncycle_count = n - expected_cycle_count

# -------------------- Miscellaneous/Robustness Cases --------------------

def test_invalid_input_raises():
    # Test input shape mismatch
    tree = np.array([0, 1])
    cycle = np.array([True])
    scores = np.array([[0, 1], [2, 0]])
    with pytest.raises(IndexError):
        process_cycle(tree, cycle, scores)

def test_cycle_and_tree_length_mismatch():
    # cycle and tree length mismatch should raise
    tree = np.array([0, 1, 2])
    cycle = np.array([True, False])
    scores = np.array([[0, 1, 2], [3, 0, 4], [5, 6, 0]])
    with pytest.raises(IndexError):
        process_cycle(tree, cycle, scores)

def test_scores_wrong_shape_raises():
    # scores not square or not matching tree size should raise
    tree = np.array([0, 1, 2])
    cycle = np.array([True, False, True])
    scores = np.array([[0, 1], [2, 0]])
    with pytest.raises(IndexError):
        process_cycle(tree, cycle, scores)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import numpy as np
# imports
import pytest
from stanza.models.common.chuliu_edmonds import process_cycle

# unit tests

# ---------------- BASIC TEST CASES ----------------

def test_basic_single_cycle():
    # Simple 4-node graph with a cycle between nodes 1 and 2
    tree = np.array([0, 2, 1, 0])
    cycle = np.array([False, True, True, False])
    scores = np.array([
        [0, 1, 2, 3],
        [4, 0, 5, 6],
        [7, 8, 0, 9],
        [10, 11, 12, 0]
    ])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)


def test_basic_all_cycle():
    # All nodes form a cycle
    tree = np.array([1,2,0])
    cycle = np.array([True, True, True])
    scores = np.array([
        [0, 1, 2],
        [3, 0, 4],
        [5, 6, 0]
    ])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)

# ---------------- EDGE TEST CASES ----------------


def test_edge_single_node_cycle():
    # Single node, is a cycle
    tree = np.array([0])
    cycle = np.array([True])
    scores = np.array([[0]])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)

def test_edge_disconnected_graph():
    # Disconnected graph: one node is not reachable
    tree = np.array([1, 1, 2])
    cycle = np.array([False, True, True])
    scores = np.array([
        [0, 1, 2],
        [3, 0, 4],
        [5, 6, 0]
    ])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)

def test_edge_cycle_at_end():
    # Cycle at the end of the node list
    tree = np.array([0, 2, 3, 2])
    cycle = np.array([False, False, True, True])
    scores = np.array([
        [0, 1, 2, 3],
        [4, 0, 5, 6],
        [7, 8, 0, 9],
        [10, 11, 12, 0]
    ])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)

def test_edge_cycle_at_start():
    # Cycle at the start of the node list
    tree = np.array([1, 0, 2, 3])
    cycle = np.array([True, True, False, False])
    scores = np.array([
        [0, 1, 2, 3],
        [4, 0, 5, 6],
        [7, 8, 0, 9],
        [10, 11, 12, 0]
    ])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)


def test_edge_all_true_cycle():
    # cycle array is all True, tree is arbitrary
    tree = np.array([2, 0, 1, 3])
    cycle = np.array([True, True, True, True])
    scores = np.array([
        [0, 1, 2, 3],
        [4, 0, 5, 6],
        [7, 8, 0, 9],
        [10, 11, 12, 0]
    ])
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)


def test_large_scale_random_cycle():
    # Large graph with random cycle
    np.random.seed(42)
    n = 1000
    tree = np.random.randint(0, n, size=n)
    cycle = np.zeros(n, dtype=bool)
    # Make a cycle among first 10 nodes
    cycle[:10] = True
    scores = np.random.randn(n, n)
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)

def test_large_scale_all_cycle():
    # All nodes in cycle, large graph
    n = 500
    tree = np.arange(1, n+1) % n
    cycle = np.ones(n, dtype=bool)
    scores = np.random.rand(n, n)
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)


def test_large_scale_sparse_cycle():
    # Sparse cycle, only 2 nodes in a 1000-node graph
    n = 1000
    tree = np.zeros(n, dtype=int)
    cycle = np.zeros(n, dtype=bool)
    cycle[123] = True
    cycle[987] = True
    scores = np.random.rand(n, n)
    subscores, cycle_locs, noncycle_locs, metanode_heads, metanode_deps = process_cycle(tree, cycle, scores)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-process_cycle-mh4g9y2l and push.

The optimized code achieves a 9% speedup through several key optimizations that reduce redundant computations and memory operations: **Key optimizations:** 1. **Reduced repeated array indexing**: The original code repeatedly computed `scores[cycle]` and `scores[noncycle]` in multiple places. The optimized version stores these as `scores_cycle` and `scores_noncycle` variables, eliminating redundant indexing operations. 2. **Replaced expensive `np.pad` with manual allocation**: The most significant optimization replaces `np.pad(subscores, ((0,1), (0,1)), 'constant')` with manual allocation using `np.zeros()` and direct assignment. The `np.pad` function has overhead for handling general padding scenarios, while direct allocation and assignment is more efficient for this specific use case. 3. **Precomputed array lengths and indices**: Added `len_cycle`, `len_noncycle`, and `idx` variables to avoid repeated shape computations and `np.arange()` calls. 4. **Split complex operations**: The original computed `metanode_head_scores` in one complex line with multiple operations. The optimized version splits this into separate operations, allowing better memory management and potentially better compiler optimization. **Performance characteristics based on test results:** - The optimizations are most effective for **medium to large-scale graphs** (like the 1000-node test cases) where the repeated indexing and padding operations become more expensive - For **sparse cycles** (few nodes in cycle), the reduced indexing overhead provides consistent benefits - The manual padding replacement is particularly beneficial when the contracted graph size is substantial, as seen in cases with many non-cycle nodes The optimizations maintain identical functionality while reducing computational overhead through more efficient memory access patterns and elimination of redundant operations.

codeflash-ai bot requested a review from mashraf-222 October 24, 2025 06:08

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `process_cycle` by 9% #222

⚡️ Speed up function `process_cycle` by 9% #222

Uh oh!

codeflash-ai bot commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function process_cycle by 9% #222

Are you sure you want to change the base?

⚡️ Speed up function process_cycle by 9% #222

Uh oh!

Conversation

codeflash-ai bot commented Oct 24, 2025

📄 9% (0.09x) speedup for process_cycle in stanza/models/common/chuliu_edmonds.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

⚡️ Speed up function `process_cycle` by 9% #222

⚡️ Speed up function `process_cycle` by 9% #222

📄 9% (0.09x) speedup for `process_cycle` in `stanza/models/common/chuliu_edmonds.py`